[analytics-engine] Strip AE-unsupported fields from test datasets at load#2
Draft
mengweieric wants to merge 36 commits into
Draft
[analytics-engine] Strip AE-unsupported fields from test datasets at load#2mengweieric wants to merge 36 commits into
mengweieric wants to merge 36 commits into
Conversation
…opensearch-project#5492) VectorSearchIT.testExecutionWithoutKnnPluginReturnsCapabilityError asserts the "k-NN plugin not installed" capability error, which is only produced on a cluster without the k-NN plugin. The plugin's own JVM integ-test cluster has no k-NN, so the test passes there. The OpenSearch distribution bundles opensearch-knn, so on the distribution integ-test cluster the query reaches k-NN and fails with a different error ("not knn_vector type", HTTP 400), breaking the 3.x distribution integ-test run across all platforms. Guard the test with Assume.assumeFalse(isKnnPluginInstalled()) so it runs only on the no-k-NN cluster it was written for, mirroring the inverse assumeTrue guard already used by VectorSearchExecutionIT. No production code changes. Signed-off-by: ahkcs <huangkaics@gmail.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Finn Carroll <carrofin@amazon.com>
…rch-project#5479) * Fix transpose VARCHAR length drift in unpivot/pivot lowering PPL `transpose` lowers via `RelBuilder.unpivot()` + `pivot()`. The unpivot synthesizes a VALUES leaf carrying axis literals (the input field names), e.g. `VALUES('firstname'), ('age'), ('balance')`. Calcite types each RexLiteral as `CHAR(literalLen)` and types the VALUES column as `CHAR(maxAxisLiteralLen)` — the longest literal wins on column-level type inference. This bites the analytics-engine route end-to-end: 1. After unpivot the `column` axis column is `CHAR(9)` (from "firstname"). Through Calcite's TRIM (`TO_VARYING`) it becomes `VARCHAR(9)`. 2. The `_value_transpose_` value column is built from `CAST(input_field AS VARCHAR)` — unbounded VARCHAR. 3. `MAX(_value_transpose_)` aggCall is created with declared return type = unbounded VARCHAR (inferred from arg 0 at call-construction time). 4. The downstream non-prefix groupSet aggregate (`group=[{1}], MAX($0)`) splits into PARTIAL/FINAL on the analytics path. PARTIAL hoists group keys to the output prefix, so FINAL's `argList=[0]` reads the group-key slot — `VARCHAR(9)` — instead of the agg-state slot. Calcite's `Aggregate.<init>` then runs `typeMatchesInferred` and rejects the plan: declared `VARCHAR` ≠ inferred `VARCHAR(9)`. 5. Even when the aggregate validation passes, the substrait/Arrow path sees `FixedChar(maxAxisLiteralLen)` schema vs runtime arrays whose actual values are shorter (e.g. "age" with length 3) and trips `Row field type (FixedChar{length=3}) does not match schema field type (FixedChar{length=9})`. Two fixes, both in the lowering site: * Build every axis literal at the same `CHAR(maxAxisLiteralLen)` type. Calcite then space-pads the shorter literals at value-construction time, so the runtime CHAR vector and the declared schema both have the same fixed length. The downstream TRIM strips the padding. * Wrap the trimmed-axis group key in an explicit `CAST(... AS VARCHAR)` to unbounded VARCHAR. This makes the group key type match `_value_transpose_`'s unbounded VARCHAR end-to-end, so the aggregate's row-type check sees consistent types regardless of which side the analytics-engine split rule places the group key on. These have to live in sql plugin, not in the analytics-engine planner: the typing decisions are made by Calcite's `RelBuilder.unpivot()` implementation when it constructs the VALUES leaf — long before any analytics-engine rule sees the plan. By the time the plan reaches the analytics-engine route, the precision drift is already baked into the RelDataType chain. Fixing it downstream would require pattern-matching on transpose-shaped sub-trees inside the planner, which is fragile and mis-attributes the root cause. The lowering author owns the type contract for the operators it emits. Adds: - `testTransposeColumnAxisUsesUnboundedVarchar` regression assertion pinning the output `column` field's type to unbounded VARCHAR. Catches any future change that re-introduces axis-literal precision into the group key. - Updated plan-shape assertions across the existing transpose tests to reflect the padded axis literals (`'cnt '`, `'COMM '`, etc.) and the `CAST(TRIM(...) AS VARCHAR)` group key. Verified end-to-end: `CalciteTransposeCommandIT` 5/5 pass with `tests.analytics.parquet_indices=true`. Signed-off-by: Songkan Tang <songkant@amazon.com> * Update explain_transpose.yaml expected plan for new lowering shape CalciteExplainIT.testTransposeExplain regenerated against the updated transpose lowering: axis literals are now padded to a uniform CHAR(N) (N = max axis literal length, i.e. 14 for 'account_number'), and the group-key TRIM output is wrapped in a CAST(... AS VARCHAR) to unbounded VARCHAR. The plan-shape diff exactly mirrors the documented behavior change in the parent commit: * `$f20=[TRIM(...)]` → `$f20=[CAST(TRIM(...)):VARCHAR NOT NULL]` * axis literals e.g. 'firstname' → 'firstname ' (padded) * LogicalValues row type tuples are correspondingly padded Verified locally: `./gradlew :integ-test:integTestRemote --tests "*CalciteExplainIT.testTransposeExplain"` passes. Signed-off-by: Songkan Tang <songkant@amazon.com> --------- Signed-off-by: Songkan Tang <songkant@amazon.com>
…rch-project#5501) * test(integ-test): exclude AnalyticsEngineCompatIT from the main integTest task AnalyticsEngineCompatIT (package org.opensearch.sql.plugin) is a smoke test that only makes sense against a cluster bundling the analytics-engine plugin stack. It is provisioned and run by the dedicated :analyticsEngineCompatIT task (testClusters.analyticsEngineCompat), which adds arrow-base, arrow-flight-rpc and analytics-engine. The main integTest task selects tests by exclusion and excluded org/opensearch/sql/security/** ("executed in another task") but never excluded the plugin package, so the smoke test also ran against the plain integTest/remoteCluster (neither of which has analytics-engine). There it is useless and exposed to suite-wide infra flakiness; in a recent CI run it was the descriptor recorded as failed after the test JVM wedged for ~4.5h. Mirror the existing security/** exclusion for org/opensearch/sql/plugin/**. Signed-off-by: Kai Huang <ahkcs@amazon.com> * test(integ-test): narrow exclusion to the AnalyticsEngineCompatIT class Address review feedback: exclude the specific test class rather than the whole org/opensearch/sql/plugin/** package, matching the per-class exclude style already used in this task (e.g. legacy/ExplainIT.class). This keeps the exclusion intentional and avoids silently dropping any future test added to that package. Signed-off-by: Kai Huang <ahkcs@amazon.com> --------- Signed-off-by: Kai Huang <ahkcs@amazon.com>
Add UNION ALL set operations to the Calcite-based unified query path. The legacy V2 SQL engine rejects UNION with a SyntaxCheckException so queries fall back to the legacy engine appropriately. Signed-off-by: Chen Dai <daichen@amazon.com>
) SELECT col, COUNT(*) AS cnt only renames a column without changing the set or order, so RelBuilder.project (force=false) skips emitting the LogicalProject and the alias is dropped. Force the projection when an AS renames a field; the PPL path is unaffected since it never produces an AS RexCall through visitProject. Signed-off-by: Chen Dai <daichen@amazon.com>
…rch-project#5510) AnalyticsEngineCompatIT is a coexistence smoke test meant to run only in the dedicated :integ-test:analyticsEngineCompatIT task, which bundles the analytics-engine plugin stack. The distribution integ-test pipeline, however, scans every *IT class against the built distribution and runs it with the security plugin enabled and without analytics-engine. There it fails during REST client init with "ConnectionClosedException: Connection closed by peer", because the test extends a bare OpenSearchRestTestCase that speaks plain HTTP to the TLS-secured port. A build.gradle exclude does not cover that pipeline. Make the test self-inert regardless of how it is discovered: - Extend OpenSearchSQLRestTestCase so the REST client honours the https/user/ password system properties of a secured cluster. - Skip via an assumption when the analytics-engine plugin is not installed, so a build without the plugin reports the test as skipped rather than failed. Signed-off-by: Kai Huang <ahkcs@amazon.com>
…rix (opensearch-project#5476) These 10 PPL/Calcite integration tests mutate the shared test index by issuing PUT /<idx>/_doc/<id> + DELETE /<idx>/_doc/<id> around the query under test. When the suite runs with -Dtests.analytics.parquet_indices=true, test indices are backed by DataFormatAwareEngine, whose delete() and prepareDelete() deliberately throw "delete operation not supported." — parquet/composite storage is append-only by design. The tests pass under the Lucene-backed default. Skipping them via Assume.assumeFalse(isAnalyticsParquetIndicesEnabled()) follows the precedent already established in StatsCommandIT and CalciteAnalyticsDatetimeWireFormatIT, and prevents the spurious bucket-22 "delete operation not supported" failures in the AE-storage report without changing any assertion or test behavior on the default path. Signed-off-by: Jialiang Liang <jiallian@amazon.com>
…s no common type (opensearch-project#5513) * Coerce temporal operands in IN and BETWEEN when leastRestrictive finds no common type visitIn and visitBetween call leastRestrictive directly instead of going through the CoercionUtils path that comparison operators use. leastRestrictive returns no common type when temporal operands are represented differently — e.g. a standard Calcite TIMESTAMP field against EXPR_DATE UDT bounds/values (`ts between date('...') and date('...')`, `ts in (DATE '...', ...)`) — so both predicates rejected such queries with "expression types are incompatible" even though comparisons coerce the same mix. Both now fall back to CoercionUtils.widenArguments (the comparison-operator path) when leastRestrictive yields null, scoped to all-temporal operands so genuinely incompatible mixes (e.g. `age between '35' and 38.5`) still raise SemanticCheckException. Enables the previously-undiscovered testDateBetween (was missing its @test annotation) and adds testDateIn plus a CoercionUtils unit test covering the plain-TIMESTAMP + EXPR_DATE mix. Signed-off-by: Lantao Jin <ltjin@amazon.com> * Compare temporal IN/NOT IN values in the field's timestamp domain instead of string-collapsing them Signed-off-by: Lantao Jin <ltjin@amazon.com> --------- Signed-off-by: Lantao Jin <ltjin@amazon.com>
…search-project#5482) (opensearch-project#5488) * [BugFix] Restore dedup pushdown when combined with WHERE clause (opensearch-project#5482) Run PPLSimplifyDedupRule before FilterMergeRule in the HEP optimizer so the bucket-non-null filter PPL emits for dedup is matched as-is. With the previous order, an upstream user where filter sat adjacent to the bucket-non-null filter; FilterMergeRule fired first and merged them into a conjunction that no longer satisfied PPLSimplifyDedupRule's operand predicate, defeating dedup pushdown to the shard. Use sequential addRuleInstance phases for explicit ordering rather than addRuleCollection, which is documented as non-deterministic in firing order. Adds two regression tests in CalcitePPLDedupTest: one that asserts LogicalDedup is produced under the fixed order, and one that pins the buggy behavior under the swapped order. Signed-off-by: ryan-gh-bot <ryan-gh-bot@users.noreply.github.com> * [BugFix] Drop issue-link reference from regression-test JavaDoc (opensearch-project#5488) Per maintainer review feedback, the regression-test JavaDoc for testDedupAfterWhereProducesLogicalDedup mentioned the originating issue URL. The remaining JavaDoc paragraphs already describe the bug shape and the rule-ordering invariant, so the explicit issue link is unnecessary noise. Signed-off-by: ryan-gh-bot <ryan-gh-bot@users.noreply.github.com> * [BugFix] Make dedup simplify operand order-independent (opensearch-project#5488) Address review feedback on opensearch-project#5488: extend mayBeFilterFromBucketNonNull to accept the merged conjunction shape FilterMergeRule produces, so PPLSimplifyDedupRule fires regardless of whether FilterMergeRule has already merged the user where clause into the bucket-non-null filter. PPLSimplifyDedupRule.apply now splits the bottom filter into IS_NOT_NULL conjuncts on partition keys (absorbed into LogicalDedup semantics) and any remaining conjuncts (preserved as a separate filter below the new LogicalDedup), so a user predicate that was folded in is no longer dropped. With the operand predicate order-independent, the HEP rule order is no longer a load-bearing invariant. Revert the addRuleCollection -> addRuleInstance change in CalciteToolsHelper.HEP_PROGRAM that the previous patch introduced. Replace the regression test that pinned the buggy rule order with one that asserts the user-visible contract: with where preceding dedup, a LogicalDedup is produced and the user predicate is preserved regardless of which order FilterMergeRule and PPLSimplifyDedupRule fire. Signed-off-by: ryan-gh-bot <ryan-gh-bot@users.noreply.github.com> * Address review comments on opensearch-project#5488 Per @penghuo review: PlanUtils.java - Revert mayBeFilterFromBucketNonNull to the original ternary form; drop the early-return refactor (no behavior change, just cleaner). - Drop the !rexCall.getOperands().isEmpty() guard before .get(0): IS NOT NULL is always unary in Calcite, so the check is dead. - Trim the JavaDoc to the essentials (un-merged vs merged-AND shape; concrete partition-key match happens in PPLSimplifyDedupRule#apply). - Promote isNotNullOnRef from package-private to public so the dedup rule can reuse it from a different package. PPLSimplifyDedupRule.java - isNotNullOnPartitionKey now delegates the IS NOT NULL($ref) structural check to PlanUtils.isNotNullOnRef and adds the partition-key index check on top. CalciteExplainIT.java - Add testDedupAfterWherePushDown: an end-to-end regression that runs the shape `... | where <pred> | dedup <field>` and asserts (a) LogicalDedup appears in the explain output (PPLSimplifyDedupRule fired even after FilterMergeRule had a chance to merge the two filters), and (b) EnumerableWindow does NOT appear (the in-memory ROW_NUMBER fallback the bug caused is gone). Signed-off-by: Jialiang Liang <jiallian@amazon.com> * Push user where filter into scan when blocking dedup pushdown PPLSimplifyDedupRule correctly produces Dedup -> Filter(user where) -> Scan when a `where` precedes `dedup`. The Filter between Dedup and Scan blocks DedupPushdownRule's strict Dedup -> Project -> Scan operand chain, so Volcano falls back to PPLDedupConvertRule and the plan ends up with an in-memory ROW_NUMBER window instead of the pushed-down composite + top_hits aggregation. Add a WITH_FILTER operand variant to DedupPushdownRule that matches Dedup -> Filter -> Scan, pushes the filter into the scan, then runs the standard apply() on the resulting Dedup -> Project -> Scan shape. Signed-off-by: Jialiang Liang <jiallian@amazon.com> * Bail when filter is only partially pushable pushDownFilter returns a Filter (not a CalciteLogicalIndexScan) when the predicate analyzer can only partially push the condition. The previous cast would have thrown ClassCastException in that case. Use an instanceof-pattern check so the rule bails out cleanly and leaves the plan untouched, letting other rules handle the residual. Also drop a stale issue-link reference from a test comment. Signed-off-by: Jialiang Liang <jiallian@amazon.com> * Apply spotless formatting to dedup unit test Reflowed the JavaDoc on testDedupAfterWhereProducesLogicalDedupWithProductionHepProgram to match Google Java Format's preferred line break, fixing the spotlessJavaCheck violation that failed the unit-test matrix on CI. Signed-off-by: Jialiang Liang <jiallian@amazon.com> --------- Signed-off-by: ryan-gh-bot <ryan-gh-bot@users.noreply.github.com> Signed-off-by: Jialiang Liang <jiallian@amazon.com> Co-authored-by: ryan-gh-bot <ryan-gh-bot@users.noreply.github.com> Co-authored-by: Jialiang Liang <jiallian@amazon.com>
…ne route (opensearch-project#5522) CalcitePPLAggregationIT.testPercentile, testSumNull, and testSumGroupByNullValue hard-coded expectations from the Calcite DSL-pushdown path, so they failed when run through the analytics-engine (DataFusion) backend via -Dtests.analytics.parquet_indices=true: - percentile() is approximate. DataFusion's t-digest interpolation returns 46576 for percentile(balance, 90) where the OpenSearch/Calcite percentile_approx returns 48086 (p50 agrees). Both are valid approximations. - SUM over an all-null bucket is null per the SQL spec. The DSL-pushdown path returns 0 (a known quirk, opensearch-project#3408); DataFusion follows the spec like Calcite-no-pushdown and returns null. Branch the expected values on the existing isAnalyticsParquetIndicesEnabled() helper, matching the pattern already used in StatsCommandIT.testSumWithNull. No production code change; both engine paths now pass. Signed-off-by: Kai Huang <ahkcs@amazon.com>
…earch-project#5481) (opensearch-project#5515) Signed-off-by: Jialiang Liang <jiallian@amazon.com>
…stSimpleCount0 + APPROX_COUNT_DISTINCT name) (opensearch-project#5525) * Use a parquet-backed index in CalcitePPLAggregationIT.testSimpleCount0 A bare auto-created index isn't composite/parquet-backed, so on the analytics-engine route it doesn't route to the analytics engine. Switch to TEST_INDEX_BANK (loaded via loadIndex, which injects parquet settings when the flag is set, 7 docs) so the test is meaningful on both routes. Diagnosis by Sandesh Kumar. Signed-off-by: Kai Huang <ahkcs@amazon.com> * Emit APPROX_COUNT_DISTINCT as the distinct_count_approx runtime name distinct_count_approx() failed to bind on the analytics-engine (DataFusion) route because the SqlAggFunction was named DISTINCT_COUNT_APPROX; the backend resolves aggregates by the Calcite/Substrait-standard name APPROX_COUNT_DISTINCT. The Java field name and PPL function name are unchanged. The OpenSearch V3 path is unaffected (it overrides this via the external HLL registration). Analytics-route binding is completed by opensearch-project/OpenSearch#22013. Per Sandesh Kumar. Signed-off-by: Kai Huang <ahkcs@amazon.com> --------- Signed-off-by: Kai Huang <ahkcs@amazon.com>
…artition (opensearch-project#5526) Signed-off-by: Chen Dai <daichen@amazon.com>
opensearch-project#5523) Signed-off-by: Chen Dai <daichen@amazon.com>
…earch-project#5521) * fix(core): cluster D — TIME-typed list elements format as HH:mm:ss Companion to OpenSearch fix/ai/datetime-clusters @ 9009736c2dc. list(<TIME-typed field>) now returns "HH:mm:ss[.fraction]" without the 1970-01-01 epoch-date prefix. The analytics-engine path rewrites PPL list() to DataFusion's list_merge, so the legacy ListAggFunction never fires. Instead, AnalyticsExecutionEngine now post-processes List-typed cells in the result conversion: when an element string matches "1970-01-01[ T]HH:mm:ss[.fraction]", only the time portion is kept. Scalar cells are untouched, preserving the wider timestamp-stringification regression baseline. Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> * fix(core): cluster A — span() output column type for date/time UDT Recognize the new sandbox DateOnlyType / TimeOnlyType UDT markers in: - OpenSearchTypeFactory.convertAnalyticsEngineRelDataTypeToExprType: DateOnlyType → ExprCoreType.DATE, TimeOnlyType → ExprCoreType.TIME so the user-visible response schema labels span() bucket columns as `date` / `time` instead of `timestamp`. - AnalyticsExecutionEngine.toExprValue: when the column carries a DateOnlyType marker, strip the trailing ` HH:MM:SS` from the Timestamp(ms)-formatted wire value so dates render as `YYYY-MM-DD`. Symmetric handling for TimeOnlyType strips the `1970-01-01 ` prefix. Pairs with the sandbox schema-builder change in opensearch-project/OpenSearch@b69c5ff8888. Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> * Preserve DATE/TIME return type for ADDDATE/SUBDATE/ADDTIME/SUBTIME on date/time columns On the analytics-engine route a 'date'/'time' column is a TIMESTAMP-backed DateOnlyType/TimeOnlyType marker, so operand-conditional return-type inference mis-read it as TIMESTAMP — ADDDATE(date_col, N) reported (and rendered) TIMESTAMP instead of DATE. Add isDateExprType/isTimeExprType helpers recognizing those markers (off the general conversion path, no substrait round-trip risk) and use them in AddSubDateFunction and PPLReturnTypes.TIME_APPLY_RETURN_TYPE. Fixes the 6 *Null column-operand cases end-to-end. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> * Tighten ADDDATE/SUBDATE return-type helper comments Comment trims on top of d12407b (Marc Handalian's cherry-pick) — collapse the multi-line Javadoc on isDateExprType/isTimeExprType to one-line case names, and drop the inline narration at the two call sites (PPLReturnTypes.TIME_APPLY_RETURN_TYPE, AddSubDateFunction#getReturnTypeInference) that restated the helper contract. Behavior unchanged. Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> * Fix tests Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> --------- Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e route (opensearch-project#5527) The 7 comparison-operator tests in OperatorIT / CalciteOperatorIT use the search-filter syntax (source=idx age = 32), which lowers to a Lucene query_string predicate. Executing it on the analytics-engine route fails with a null LuceneReader: query_string needs an inverted-index searcher, which parquet-backed analytics indices don't have (DataFusion is not a full-text engine). Full-text search is genuinely unsupported there. Exclude these tests only when the analytics route is actually enabled (Boolean.parseBoolean(tests.analytics.parquet_indices), matching AnalyticsIndexConfig.isEnabled) so the v2 path still runs them, following the established build.gradle exclusion pattern. Signed-off-by: Kai Huang <ahkcs@amazon.com>
…search-project#5529) With -Dtests.analytics.parquet_indices=true, indices created by a raw document PUT (e.g. `PUT /test/_doc/1` in a test's init()) bypass AnalyticsIndexConfig.applyIndexCreationSettings, so they inherit the composite *value* — and are therefore routed to the analytics engine by RestUnifiedQueryAction.isAnalyticsIndex — but not the `pluggable.dataformat.enabled` flag. They are then stored as a plain-Lucene EngineBackedIndexer whose acquireReader() is unimplemented, and the query fails with `StreamException[INTERNAL] Failed to start streaming fragment`. Apply the cluster-level composite defaults in setUpIndices() so every index — including raw-PUT ones — is stored as a parquet-backed DataFormatAwareEngine that is actually scannable by the analytics engine it routes to. No-op unless tests.analytics.parquet_indices=true, so normal CI is unchanged. Signed-off-by: Kai Huang <ahkcs@amazon.com>
…sting (opensearch-project#5505) (opensearch-project#5508) Signed-off-by: Radhakrishnan Pachyappan <gingeekrishna@gmail.com>
…dataformat setting (opensearch-project#5528) When cluster.pluggable.dataformat=composite, isAnalyticsIndex routed every query to the analytics engine, which cannot serve the system catalog (*_ODFE_SYS_TABLE, .DATASOURCES) that SHOW/DESCRIBE resolve. Detect system-catalog queries (including legacy-syntax SHOW/DESCRIBE that the V2 parser rejects) and keep them on the default pipeline while data queries continue to the analytics engine. Also log query routing to the analytics engine at both call sites. Signed-off-by: Chen Dai <daichen@amazon.com>
…analytics runs (opensearch-project#5537) The analytics engine merges per-shard Arrow batches in arrival order (no shard-ordinal tiebreaker), so any IT whose asserted rows depend on document encounter order fails when test indices have more than one primary shard. Add an explicit sort on a unique key to the affected queries so results are deterministic on every route; where the sorted result differs from insertion order (accounts.json is not account_number-ascending), update expectations to the sorted rows. Also add the tests.analytics.num_shards system property (default 1, wired through integTestRemote and TestUtils.AnalyticsIndexConfig) so multi-shard coverage runs can be reproduced: ./gradlew :integ-test:integTestRemote \ -Dtests.rest.cluster=localhost:9200 -Dtests.cluster=localhost:9300 \ -Dtests.clustername=runTask \ -Dtests.analytics.parquet_indices=true -Dtests.analytics.num_shards=3 Verified on three routes: 3-shard analytics, 1-shard analytics, and the regular Calcite route (self-managed cluster, all green). Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
…oject#5532) UnifiedQueryPlanner.plan() let field-not-found (ErrorReport) and Calcite validation errors such as table-not-found (CalciteException) fall through to the catch-all and rethrew them as IllegalStateException. Propagate ErrorReport unwrapped, map CalciteException to SemanticCheckException, and log client errors at WARN. Also unwrap ErrorReport to its cause in the SQL error formatter so the reported type reflects the cause, not the wrapper. Signed-off-by: Chen Dai <daichen@amazon.com>
…pe field path (opensearch-project#5536) * Fix opaque NullPointerException for unresolvable alias-type field path When a mapping contains a field of "type": "alias" whose "path" points to a target absent from the flattened mapping (a text multi-field such as field.keyword, or a removed/renamed field), validateAliasType passed a null target into the OpenSearchAliasType constructor, which dereferenced it at super(type.getExprCoreType()) and surfaced an opaque NullPointerException. Guard the null target and throw a SemanticCheckException naming the alias field and its unresolved path. SemanticCheckException extends QueryEngineException, so JdbcResponseFormatter maps it to HTTP 400 (client error) rather than the misleading 500 a generic exception would produce. Add unit tests covering the .keyword multi-field and missing-field cases. Fixes opensearch-project#5535 Signed-off-by: Jialiang Liang <jiallian@amazon.com> * Add integration test and trim unit-test comments for alias path fix Add a QueryValidationIT case asserting that SELECT * over an index whose alias field targets a text multi-field (source.keyword) returns a 400 SemanticCheckException with the descriptive message. An alias pointing at a truly missing field is rejected by OpenSearch at index-creation time, so it is not reachable through the SQL plugin and is covered by the unit test only. Shorten the unit-test comments and drop inline issue references. Signed-off-by: Jialiang Liang <jiallian@amazon.com> * Apply spotless formatting Signed-off-by: Jialiang Liang <jiallian@amazon.com> * Adopt ErrorReport for unresolvable alias path error Wrap the SemanticCheckException in an ErrorReport (the report-builder interface from opensearch-project#5266) so the error carries structured context as it bubbles up: FIELD_NOT_FOUND code, ANALYZING stage, a location chain, the alias field and path as context, and a fix suggestion. On the PPL/Calcite path this renders as a rich structured error; on the SQL JDBC path it still returns a clear 400 (RestSqlAction unwraps to the SemanticCheckException cause), though the JdbcResponseFormatter does not yet render the ErrorReport structure. Update the unit test to assert the ErrorReport code/stage/context/cause, the SQL IT for the ErrorReport type, and add a PPL IT asserting the structured FIELD_NOT_FOUND error in CalciteErrorReportStageIT. Signed-off-by: Jialiang Liang <jiallian@amazon.com> --------- Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Eric Wei <mengwei.eric@gmail.com> Co-authored-by: Simeon Widdis <sawiddis@gmail.com>
…-project#5540) Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
…overage The analytics-engine cannot read nested/geo_point/geo_shape fields, so a handful of test datasets need an _ae sibling that drops those fields from the mapping and the matching keys from each bulk doc. Add TestUtils.AnalyticsIndexConfig#resolveDatasetPath to transparently route a dataset/mapping path to its _ae variant when the config is enabled and the sibling exists on disk; normal CI is byte-for-byte identical otherwise. Ships the _ae variants for datatypes, deep_nested, merge_test_1/2, and nested_simple, plus AnalyticsIndexConfigVariantTests covering the resolution logic. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
…load The analytics-engine (DataFusion) backend cannot read nested, geo_point, geo_shape, binary, or alias fields. Rather than maintain per-index _ae dataset variants, strip those fields uniformly at load time on the AE route: - createIndexByRestClient recursively removes unsupported-typed properties from the mapping (including object sub-properties) and reports the top-level fields dropped. - loadIndex passes that set to a new loadDataByRestClient overload, which strips the matching keys from each bulk source doc so mapping and data agree. Gated on tests.analytics.parquet_indices; byte-for-byte no-op off the AE route. Replaces the earlier filename-_ae hook + hand-authored variant files. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Add an exhaustive, fail-loud verifier IT (AnalyticsUnsupportedFieldStripVerifyIT) that loads every Index enum dataset through the real loadIndex harness on the AE route and asserts no nested/geo_point/geo_shape/binary/alias field survives in any live mapping — converting the "silent expected:<1> but was:<0>" risk into a direct, attributable failure. Skip CalciteAliasFieldAggregationIT on the AE route: it creates its index via a raw inline PUT (bypassing the load-path strip) and its queries reference alias fields directly, so alias aggregation is inherently AE-incompatible — there is nothing to salvage by stripping. Matches the existing Assume.assumeFalse idiom. Full calcite/remote PPL sweep on the AE route now has zero ingestion failures attributable to the five unsupported field types. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
812f87b to
13d1081
Compare
…eserve untouched bulk docs
binary maps to VARBINARY in the analytics-engine schema builder (same as ip)
and is a registered parquet field type, so it scans fine - drop it from the
unsupported-field strip set, which is now {nested, geo_point, geo_shape, alias}.
Stripping binary needlessly mutated fixtures and risked masking a real
binary-handling bug instead of letting the assertion surface it.
Make UNSUPPORTED_FIELD_TYPES the single source of truth (public) and have
AnalyticsUnsupportedFieldStripVerifyIT import it instead of re-listing, so the
stripper and the verifier cannot drift. Add a byte-identity guard to
stripBulkFields: a bulk doc line is only re-serialized when it actually carried
a dropped key, so untouched docs pass through verbatim.
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
13d1081 to
d076417
Compare
The unsupported-field strip removes nested/geo_point/geo_shape/alias columns from test data+mappings on the AE route, so PPL tests whose queries reference such a column can never pass - the data is gone. Physically exclude them in integ-test/build.gradle (gated on tests.analytics.parquet_indices), centralized with the existing AE excludes and grouped comments, rather than letting them fail. Verified field-by-field against the index mappings; only doomed work is removed: - GeoPointFormatsIT (whole class - every test reads a geo_point field). - DataTypeIT.test_nonnumeric_data_types (nested_value + geo_point_value). - DataTypeIT.test_alias_data_type (where alias_col, type=alias). - SystemFunctionIT.typeof_opensearch_types (typeof(geo_point_value)). Cleared as NOT doomed (kept running): GeoIpFunctionsIT (geoip() over text ip/name), StatsCommandIT.testStatsNested* (nested function calls, not a nested field), ConvertCommandIT/TrendlineCommandIT/ExplainIT alias usages (result aliases, not alias-typed fields). Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
…loud, stronger verifier
Resolves the correctness holes raised in review.
Blocking 1 — path-aware mapping+source strip. stripUnsupportedMappingFields
now returns exact dropped PATHS (e.g. [location, point]) instead of only
top-level names; stripBulkFields removes those paths recursively from _source,
descending through objects AND arrays of objects, preserving siblings
(location.city etc.). Adds unit coverage for a complex_geo shape and arrays of
objects.
Blocking 2 — fail loud on bulk item failures. loadDataByRestClient now parses
the _bulk response and throws on errors=true (naming the index + first item
errors), for ALL test bulk loads. Silent partial ingestion no longer slips
through a 200 status.
Blocking 3 — verifier proves the invariant. AnalyticsUnsupportedFieldStripVerifyIT
deletes each index first so the real load path always runs (not short-circuited
by isIndexExist), then per dataset checks: clean live mapping, no stripped path
left in sampled _source, and doc-count == source-doc count (tolerating a
cluster-side count error on system indices with a transparent logged skip).
NB1 — GeoPointFormatsIT exclude moved under the parsed analyticsEnabled gate
(was gated on mere property presence, so =false wrongly excluded it).
NB2 — out-of-scope skip (join) now refuses to skip if the mapping also declares
any unsupported type, so a mixed failure can't be hidden.
binary is now conditional: kept when store:true (engine reads VARBINARY), but
stripped when store:false because the parquet store can't create it
("Unable to derive source for [X] with store disabled", verified on the AE
cluster). Confirmed end-to-end: the verifier passes over all Index datasets on a
force-routed AE cluster.
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
…in build.gradle Per review: keep all AE-route test exclusions in one place. Move the inline Assume.assumeFalse(isAnalyticsParquetIndicesEnabled()) guard out of CalciteAliasFieldAggregationIT into the gated analyticsEnabled exclude block in integ-test/build.gradle, alongside the other doomed PPL ITs. The rationale (raw-PUT alias index can't be created on the AE route) is carried into both the build.gradle comment and a short note left at the test's init(). Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On the analytics-engine route (
-Dtests.analytics.parquet_indices=true), seeding a test dataset whose mapping contains a field type the parquet/composite store cannot scan (nested,geo_point,geo_shape,alias) fails ingestion. The failure surfaces later as an unrelatedexpected:<1> but was:<0>in a downstream IT rather than at the offending index/field — masking real assertions.Fix
Strip unsupported-typed fields uniformly at load time, gated on
tests.analytics.parquet_indices; a byte-for-byte no-op off the AE route.createIndexByRestClientrecursively removes{nested, geo_point, geo_shape, alias}properties (including object sub-properties) and reports the dropped top-level fields.loadIndexpasses that set to aloadDataByRestClientoverload that strips the same keys from each bulk source doc, so mapping and data agree. A doc line is re-serialized only when it actually carried a dropped key, so untouched docs stay verbatim.UNSUPPORTED_FIELD_TYPESis the single source of truth;AnalyticsUnsupportedFieldStripVerifyITimports it and fail-loud asserts no listed type survives in any live mapping, so the list can't silently drift.binaryis intentionally not stripped: the engine mapsbinary → VARBINARY(same asip, seeOpenSearchSchemaBuilder.mapFieldType) and it is a registered parquet field type, so it scans fine. Stripping it would needlessly mutate fixtures and could mask a real binary-handling bug.Testing
AnalyticsFieldStripTests(pure-logic, no cluster): recursive mapping strip, bulk-key strip,binarykept, byte-identity of untouched docs, and no-op-when-disabled — 5/5 pass.:integ-test:integTestRemoteagainst a force-routed 9-plugin cluster, incl.AnalyticsUnsupportedFieldStripVerifyITover everyIndexdataset) still pending — draft until that is green.Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.